traditional model
Comparing energy consumption and accuracy in text classification inference
Zschache, Johannes, Hartwig, Tilman
The increasing deployment of large language models (LLMs) in natural language processing (NLP) tasks raises concerns about energy efficiency and sustainability. While prior research has largely focused on energy consumption during model training, the inference phase has received comparatively less attention. This study systematically evaluates the trade-offs between model accuracy and energy consumption in text classification inference across various model architectures and hardware configurations. Our empirical analysis shows that the best-performing model in terms of accuracy can also be energy-efficient, while larger LLMs tend to consume significantly more energy with lower classification accuracy. We observe substantial variability in inference energy consumption ($<$mWh to $>$kWh), influenced by model type, model size, and hardware specifications. Additionally, we find a strong correlation between inference energy consumption and model runtime, indicating that execution time can serve as a practical proxy for energy usage in settings where direct measurement is not feasible. These findings have implications for sustainable AI development, providing actionable insights for researchers, industry practitioners, and policymakers seeking to balance performance and resource efficiency in NLP applications.
- Europe > Germany > Saxony > Leipzig (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
52130c418d4f02c74f74a5bc1f8020b2-AuthorFeedback.pdf
We thank all the reviewers for their positive comments, and address their major questions and comments below. Clarifications will be added in the revision and we will keep improving our draft. Reviewer #1 We thank the reviewer for the positive reviews. The remarks raised are addressed below. We are happy to release our code for better reproducibility.
A Unifying Framework for Semiring-Based Constraint Logic Programming With Negation (full version)
Spaans, Jeroen, Heyninck, Jesse
Constraint Logic Programming (CLP) is a logic programming formalism used to solve problems requiring the consideration of constraints, like resource allocation and automated planning and scheduling. It has previously been extended in various directions, for example to support fuzzy constraint satisfaction, uncertainty, or negation, with different notions of semiring being used as a unifying abstraction for these generalizations. None of these extensions have studied clauses with negation allowed in the body. We investigate an extension of CLP which unifies many of these extensions and allows negation in the body. We provide semantics for such programs, using the framework of approximation fixpoint theory, and give a detailed overview of the impacts of properties of the semirings on the resulting semantics. As such, we provide a unifying framework that captures existing approaches and allows extending them with a more expressive language.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- Europe > Netherlands (0.04)
- (5 more...)
Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography
The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks and vision transformers, learn directly from image data, radiomics-based models extract handcrafted features, offering potential advantages in data-limited scenarios. We systematically compared the diagnostic performance of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines, and Multi-Layer Perceptrons for radiomics, against state-of-the-art deep learning models such as InceptionV3, EfficientNetL, and ConvNeXtXLarge. Performance was evaluated across multiple sample sizes. At 24 samples, EfficientNetL achieved an AUC of 0.839, outperforming SVM (AUC = 0.762). At 4000 samples, InceptionV3 achieved the highest AUC of 0.996, compared to 0.885 for Random Forest. A Scheirer-Ray-Hare test confirmed significant main and interaction effects of model type and sample size on all metrics. Post hoc Mann-Whitney U tests with Bonferroni correction further revealed consistent performance advantages for deep learning models across most conditions. These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI. Deep learning models demonstrated higher performance and better scalability with increasing data availability, while radiomics-based models may remain useful in low-data contexts. This study addresses a critical gap in AI-based diagnostic research by offering practical guidance for deploying AI models across diverse clinical environments.
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks
Hekler, Achim, Kuhn, Lukas, Buettner, Florian
Reliable uncertainty calibration of neural networks is crucial for safety-critical applications. Current calibration research has two major limitations: the exclusive evaluation of large web-scraped datasets and the lack of investigation of contemporary high-performance models with recent architectural and training innovations. To address this gap, we conducted a systematic investigation of different model generations on diverse datasets, revealing insights that challenge established calibration paradigms. Our results show that current-generation models consistently exhibit underconfidence in their in-distribution predictions - contrasting with the overconfidence typically reported in earlier model generations - while showing improved calibration under distribution shift. Although post-hoc calibration techniques significantly improve in-distribution calibration performance, their effectiveness progressively diminishes with increasing distribution shift, ultimately becoming counterproductive in extreme cases. Critically, extending our analysis to four diverse biomedical imaging datasets using transfer learning highlights the limited transferability of insights from web-scraped benchmarks. In these domains, convolutional architectures consistently achieve superior calibration compared to transformer-based counterparts, irrespective of model generation. Our findings underscore that model advancements have complex effects on calibration, challenging simple narratives of monotonic improvement, and emphasize the critical need for domain-specific architectural evaluation beyond standard benchmarks. This dual requirement is particularly critical in high-stakes domains such as medical diagnosis [1, 2], autonomous driving [3], and financial decision-making [4], where misaligned confidence estimates can lead to incorrect decisions with potentially severe or life-threatening consequences. Model calibration offers a systematic framework for evaluating the reliability of a model's predictive confidence [5, 6]. In a well-calibrated model, confidence scores align closely with the true likelihood of correctness.
- Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- Health & Medicine > Therapeutic Area > Oncology (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Applying Informer for Option Pricing: A Transformer-Based Approach
Bańka, Feliks, Chudziak, Jarosław A.
Accurate option pricing is essential for effective trading and risk management in financial markets, yet it remains challenging due to market volatility and the limitations of traditional models like Black-Scholes. In this paper, we investigate the application of the Informer neural network for option pricing, leveraging its ability to capture long-term dependencies and dynamically adjust to market fluctuations. This research contributes to the field of financial forecasting by introducing Informer's efficient architecture to enhance prediction accuracy and provide a more adaptable and resilient framework compared to existing methods. Our results demonstrate that Informer outperforms traditional approaches in option pricing, advancing the capabilities of data-driven financial forecasting in this domain.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Weather forecasting improves with AI, but we still need humans
Breakthroughs, discoveries, and DIY tips sent every weekday. Weather forecasts are notoriously unreliable. Most people can relate to booking a trip or making plans expecting a sunny day, only to have it disappointingly rained out. While seven-day weather forecasts are accurate about 80 percent of the time, that figure drops to around 50 percent when extended to 10 days or more. Recent staffing cuts at the National Weather Service have already led to reduced weather balloon data collection, which experts warn could further degrade forecast accuracy.
- North America > United States (1.00)
- Asia (0.15)
An Empirical Study of Many-to-Many Summarization with Large Language Models
Wang, Jiaan, Meng, Fandong, Sun, Zengkui, Liang, Yunlong, Cao, Yuxuan, Xu, Jiarong, Shi, Haoxiang, Zhou, Jie
Many-to-many summarization (M2MS) aims to process documents in any language and generate the corresponding summaries also in any language. Recently, large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform M2MS in real applications. This work presents a systematic empirical study on LLMs' M2MS ability. Specifically, we first reorganize M2MS data based on eight previous domain-specific datasets. The reorganized data contains 47.8K samples spanning five domains and six languages, which could be used to train and evaluate LLMs. Then, we benchmark 18 LLMs in a zero-shot manner and an instruction-tuning manner. Fine-tuned traditional models (e.g., mBART) are also conducted for comparisons. Our experiments reveal that, zero-shot LLMs achieve competitive results with fine-tuned traditional models. After instruct-tuning, open-source LLMs can significantly improve their M2MS ability, and outperform zero-shot LLMs (including GPT-4) in terms of automatic evaluations. In addition, we demonstrate that this task-specific improvement does not sacrifice the LLMs' general task-solving abilities. However, as revealed by our human evaluation, LLMs still face the factuality issue, and the instruction tuning might intensify the issue. Thus, how to control factual errors becomes the key when building LLM summarizers in real applications, and is worth noting in future research.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Singapore (0.04)
- (11 more...)
A Structured Literature Review on Traditional Approaches in Current Natural Language Processing
Jegan, Robin, Henrich, Andreas
The continued rise of neural networks and large language models in the more recent past has altered the natural language processing landscape, enabling new approaches towards typical language tasks and achieving mainstream success. Despite the huge success of large language models, many disadvantages still remain and through this work we assess the state of the art in five application scenarios with a particular focus on the future perspectives and sensible application scenarios of traditional and older approaches and techniques. In this paper we survey recent publications in the application scenarios classification, information and relation extraction, text simplification as well as text summarization. After defining our terminology, i.e., which features are characteristic for traditional techniques in our interpretation for the five scenarios, we survey if such traditional approaches are still being used, and if so, in what way they are used. It turns out that all five application scenarios still exhibit traditional models in one way or another, as part of a processing pipeline, as a comparison/baseline to the core model of the respective paper, or as the main model(s) of the paper. For the complete statistics, see https://zenodo.org/records/13683801
- Europe > Austria > Vienna (0.14)
- Asia > China > Fujian Province > Xiamen (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- (9 more...)
- Research Report (1.00)
- Overview (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)
- Information Technology (0.68)
- Energy (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection?
Yew, Samantha Min Er, Lei, Xiaofeng, Goh, Jocelyn Hui Lin, Chen, Yibing, Srinivasan, Sahana, Chee, Miao-li, Pushpanathan, Krithi, Zou, Ke, Hou, Qingshan, Da Soh, Zhi, Xue, Cancan, Yu, Marco Chak Yan, Sabanayagam, Charumathi, Tai, E Shyong, Sim, Xueling, Wang, Yaxing, Jonas, Jost B., Nangia, Vinay, Yang, Gabriel Dawei, Ran, Emma Anran, Cheung, Carol Yim-Lui, Feng, Yangqin, Zhou, Jun, Goh, Rick Siow Mong, Zhou, Yukun, Keane, Pearse A., Liu, Yong, Cheng, Ching-Yu, Tham, Yih-Chung
Background: RETFound, a self-supervised, retina-specific foundation model (FM), showed potential in downstream applications. However, its comparative performance with traditional deep learning (DL) models remains incompletely understood. This study aimed to evaluate RETFound against three ImageNet-pretrained supervised DL models (ResNet50, ViT-base, SwinV2) in detecting ocular and systemic diseases. Methods: We fine-tuned/trained RETFound and three DL models on full datasets, 50%, 20%, and fixed sample sizes (400, 200, 100 images, with half comprising disease cases; for each DR severity class, 100 and 50 cases were used. Fine-tuned models were tested internally using the SEED (53,090 images) and APTOS-2019 (3,672 images) datasets and externally validated on population-based (BES, CIEMS, SP2, UKBB) and open-source datasets (ODIR-5k, PAPILA, GAMMA, IDRiD, MESSIDOR-2). Model performance was compared using area under the receiver operating characteristic curve (AUC) and Z-tests with Bonferroni correction (P<0.05/3). Interpretation: Traditional DL models are mostly comparable to RETFound for ocular disease detection with large datasets. However, RETFound is superior in systemic disease detection with smaller datasets. These findings offer valuable insights into the respective merits and limitation of traditional models and FMs.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)